Levenshtein Distance Technique in Dictionary Lookup Methods: An Improved Approach

نویسندگان

  • Rishin Haldar
  • Debajyoti Mukhopadhyay
چکیده

Dictionary lookup methods are popular in dealing with ambiguous letters which were not recognized by Optical Character Readers. However, a robust dictionary lookup method can be complex as apriori probability calculation or a large dictionary size increases the overhead and the cost of searching. In this context, Levenshtein distance is a simple metric which can be an effective string approximation tool. After observing the effectiveness of this method, an improvement has been made to this method by grouping some similar looking alphabets and reducing the weighted difference among members of the same group. The results showed marked improvement over the traditional Levenshtein distance technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error Correction for Arabic Dictionary Lookup

We describe a new Arabic spelling correction system which is intended for use with electronic dictionary search by learners of Arabic. Unlike other spelling correction systems, this system does not depend on a corpus of attested student errors but on studentand teacher-generated ratings of confusable pairs of phonemes or letters. Separate error modules for keyboard mistypings, phonetic confusio...

متن کامل

Fast Approximate Search in Large Dictionaries

The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to P does not exceed a given (small) bound k. In this article we describe methods for efficiently s...

متن کامل

Mapping Words Between Slovak Text and its Translation to English

Word alignment in texts translated to different languages is used in various applications such as cross-language information retrieval. To search for equivalent words in text translations various statistical methods, methods based on position of words in phrases and methods based on bilingual dictionaries are used. However it is very difficult to use these methods in languages with big morpholo...

متن کامل

A Hybrid Approach to Align Sentences and Words in English-Hindi Parallel Corpora

In this paper we describe an alignment system that aligns English-Hindi texts at the sentence and word level in parallel corpora. We describe a simple sentence length approach to sentence alignment and a hybrid, multi-feature approach to perform word alignment. We use regression techniques in order to learn parameters which characterise the relationship between the lengths of two sentences in p...

متن کامل

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion

The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1101.1232  شماره 

صفحات  -

تاریخ انتشار 2011